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Abstract. This paper seeks to bridge the two major algorithmic approaches to 
sparse signal recovery from an incomplete set of linear measurements - Li-mini- 
mization methods and iterative methods (Matching Pursuits). We find a simple 
regularized version of Orthogonal Matching Pursuit (ROMP) which has advan- 
tages of both approaches: the speed and transparency of OMP and the strong 
uniform guarantees of Li-minimization. Our algorithm ROMP reconstructs a 
sparse signal in a number of iterations linear in the sparsity, and the reconstruc- 
tion is exact provided the linear measurements satisfy the Uniform Uncertainty 
Principle. 



Sparse recovery problems arise in many applications ranging from medical imaging 
to error correction. Suppose v is an unknown (^-dimensional signal with at most 
n d nonzero components: 



We call such signals n-sparse. Suppose we are able to collect N <ti d nonadaptive 
linear measurements of v, and wish to efficiently recover v from these. The mea- 
surements are given as the vector $t> e M, N , where <3> is some N x d measurement 
matrix 

As discussed in [2], exact recovery is possible with just N = 2n. However, recovery 
using only this property is not numerically feasible; the sparse recovery problem in 
general is known to be NP-hard. Nevertheless, massive recent work in the emerging 
area of Compressed Sensing demonstrated that for several natural classes of measure- 
ment matrices $, the signal v can be exactly reconstructed from its measurements 
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<&v with 



(1.1) 



N = nlog° w (d). 



In other words, the number of measurements N -C d should be almost linear in 
the sparsity n. Survey [JJ contains some of these results; the Compressed Sensing 
webpage [6] documents progress in this area. 

The two major algorithmic approaches to sparse recovery are methods based on 
Lx-minimization and iterative methods (Matching Pursuits). We now briefly de- 
scribe these methods. Then we propose a new iterative method that has advantages 
of both approaches. 

1.1. Lx-minimization. This approach to sparse recovery has been advocated over 
decades by Donoho and his collaborators (see e.g. [ID]). The sparse recovery prob- 
lem can be stated as the problem of finding the sparsest signal v with the given 
measurements Qv: 

(Lq) min||w||o subject to $w = $>v 

where ||w||o := |supp(w)|. Donoho and his associates advocated the principle that for 
some measurement matrices <E>, the highly non-convex combinatorial optimization 
problem (Lq) should be equivalent to its convex relaxation 

(Li) min||tt||i subject to $w = <&v 

where ||w||i = £\ \ u i\ denotes the ^-norm of the vector u = (u%, . . . , w rf ). The convex 
problem (L\) can be solved using methods of convex and even linear programming. 

The recent progress in the emerging area of Compressed Sensing pushed forward 
this program (see survey [1]). A necessary and sufficient condition of exact sparse 
recovery is that the map $ be one-to-one on the set of n-sparse vectors. Candes and 
Tao [5J proved that a stronger quantitative version of this condition guarantees the 
equivalence of the problems (Lq) and (Li). 

Definition 1.1 (Restricted Isometry Condition). A measurement matrix $ satisfies 
the Restricted Isometry Condition (RIC) with parameters (m, e) for s G (0, 1) if we 
have 



The Restricted Isometry Condition states that every set of m columns of $ forms 
approximately an orthonormal system. One can interpret the Restricted Isometry 
Condition as an abstract version of the Uniform Uncertainty Principle in harmonic 
analysis ([I], see also discussions in [3] and [To]). 

Theorem 1.2 (Sparse recovery under RIC [5]). Assume that the measurement ma- 
trix $ satisfies the Restricted Isometry Condition with parameters (3n, 0.2). Then 
every n-sparse vector x can be exactly recovered from its measurements §x as a 
unique solution to the convex optimization problem (Li). 



(l-e)H| 2 < HHh < (1 + £)|M| 2 



for all m-sparse vectors. 



In a lecture on Compressive Sampling, Candes sharpened this to work for the 
Restricted Isometry Condition with parameters (2n, y/2 — 1). Measurement matrices 
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that satisfy the Restricted Isometry Condition with number of measurements as in 
(11. ip include random Gaussian, Bernoulli and partial Fourier matrices. Section [2] 
contains more detailed information. 

1.2. Orthogonal Matching Pursuit (OMP). An alternative approach to sparse 
recovery is via iterative algorithms, which find the support of the n-sparse signal v 
progressively. Once S = supp(t> ) is found correctly, it is easy to compute the signal v 
from its measurements x = $t> as v = where $5 denotes the measurement 
matrix $ restricted to columns indexed by S. 

A basic iterative algorithm is Orthogonal Matching Pursuit (OMP), popularized 
and analyzed by Gilbert and Tropp in [21 J, see [22j for a more general setting. OMP 
recovers the support of v, one index at a time, in n steps. Under a hypothetical 
assumption that $ is an isometry, i.e. the columns of $ are orthonormal, the signal 
v can be exactly recovered from its measurements x = as v = 

The problem is that the N x d matrix $ is never an isometry in the interesting 
range where the number of measurements N is smaller than the ambient dimension 
d. Even though the matrix is not an isometry, one can still use the notion of 
coherence in recovery of sparse signals. In that setting, greedy algorithms are used 
with incoherent dictionaries to recover such signals, see [8], [9], [13]. In our setting, 
for random matrices one expects the columns to be approximately orthogonal, and 
the observation vector u = Q*x to be a good approximation to the original signal v. 

The biggest coordinate of the observation vector u in magnitude should thus be 
a nonzero coordinate of the signal v. We thus find one point of the support of v. 
Then OMP can be described as follows. First, we initialize the residual r = x. At 
each iteration, we compute the observation vector u = $*r. Denoting by I the 
coordinates selected so far, we solve a least squares problem and update the residual 

y = argmin \\x — *&z\\2', r = x — 

to remove any contribution of the coordinates in /. OMP then iterates this procedure 
n times, and outputs a set / of size n, which should equal the support of the signal 
v. 

Tropp and Gilbert [21] analyzed the performance of OMP for Gaussian measure- 
ment matrices $; a similar result holds for general subgaussian matrices. They 
proved that, for every fixed n-sparse (^-dimensional signal v, and an N x d ran- 
dom Gaussian measurement matrix $, OMP recovers (the support of) v from the 
measurements x = $u correctly with high probability, provided the number of mea- 
surements is ~ n log d. 

1.3. Advantages and challenges of both approaches. The Li-minimization 

method has strongest known guarantees of sparse recovery. Once the measurement 
matrix $ satisfies the Restricted Isometry Condition, this method works correctly for 
all sparse signals v. No iterative methods have been known to feature such uniform 
guarantees, with the exception of Chaining Pursuit [2] and the HHS Algorithm 
[T5] which however only work with specifically designed structured measurement 
matrices. 
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The Restricted Isometry Condition is a natural abstract deterministic property 
of a matrix. Although establishing this property is often nontrivial, this task is 
decoupled from the analysis of the recovery algorithm. 

Li-minimization is based on linear programming, which has its advantages and 
disadvantages. One thinks of linear programming as a black box, and any develop- 
ment of fast solvers will reduce the running time of the sparse recovery method. On 
the other hand, it is not very clear what this running time is, as there is no strongly 
polynomial time algorithm in linear programming yet. All known solvers take time 
polynomial not only in the dimension of the program d, but also on certain condition 
numbers of the program. While for some classes of random matrices the expected 
running time of linear programming solvers can be bounded (see the discussion in 
[20J and subsequent work in [23]), estimating condition numbers is hard for specific 
matrices. For example, there is no result yet showing that the Restricted Isometry 
Condition implies that the condition numbers of the corresponding linear program 
is polynomial in d. 

Orthogonal Matching Pursuit is quite fast, both theoretically and experimentally. 
It makes n iterations, where each iteration amounts to a multiplication by a d x iV 
matrix $* (computing the observation vector u), and solving a least squares problem 
in dimensions at most N x n (with matrix This yields strongly polynomial 

running time. In practice, OMP is observed to perform faster and is easier to 
implement than Li-minimization |21j. For more details, see [2~I] . 

Orthogonal Matching Pursuit is quite transparent: at each iteration, it selects 
a new coordinate from the support of the signal v in a very specific and natural 
way. In contrast, the known Li-minimization solvers, such as the simplex method 
and interior point methods, compute a path toward the solution. However, the 
geometry of L\ is clear, whereas the analysis of greedy algorithms can be difficult 
simply because they are iterative. 

On the other hand, Orthogonal Matching Pursuit has weaker guarantees of exact 
recovery. Unlike Li-minimization, the guarantees of OMP are non-uniform: for 
each fixed sparse signal v and not for all signals, the algorithm performs correctly 
with high probability. Rauhut has shown that uniform guarantees for OMP are 
impossible for natural random measurement matrices [18J. 




Moreover, OMP's condition on measurement matrices given in (21] is more restric- 
tive than the Restricted Isometry Condition. In particular, it is not known whether 
OMP succeeds in the important class of partial Fourier measurement matrices. 

These open problems about OMP, first stated in [2T] and often reverberated in the 
Compressed Sensing community, motivated the present paper. We essentially settle 
them in positive by the following modification of Orthogonal Matching Pursuit. 

1.4. Regularized OMP. This new algorithm for sparse recovery will perform cor- 
rectly for all measurement matrices $ satisfying the Restricted Isometry Condition, 
and for all sparse signals. 

When we are trying to recover the signal v from its measurements x = <&v, we can 
use the observation vector u = $*x as a good local approximation to the signal v. 
Namely, the observation vector u encodes correlations of the measurement vector x 
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with the columns of $. Note that $ is a dictionary, and so since the signal v is sparse, 
x has a sparse representation with respect to the dictionary. By the Restricted 
Isometry Condition, every n columns form approximately an orthonormal system. 
Therefore, every n coordinates of the observation vector u look like correlations of 
the measurement vector x with the orthonormal basis and therefore are close in the 
Euclidean norm to the corresponding n coefficients of v. This is documented in 
Proposition 13.21 below. 

The local approximation property suggests to make use of the n biggest coordi- 
nates of the observation vector u, rather than one biggest coordinate as OMP did. 
We thus force the selected coordinates to be more regular (ie. closer to uniform) by 
selecting only the coordinates with comparable sizes. To this end, a new regulariza- 
tion step will be needed to ensure that each of these coordinates gets an even share 
of information. This leads to the following algorithm for sparse recovery: 

Regularized Orthogonal Matching Pursuit (ROMP) 

Input: Measurement vector x G M. N and sparsity level n 
Output: Index set Id {1, . . . , d} 

Initialize: Let the index set / = and the residual r = x. 

Repeat the following steps until r = 0: 
Identify: Choose a set J of the n biggest coordinates in magnitude of the 

observation vector u = $*r, or all of its nonzero coordinates, whichever 

set is smaller. 

Regularize: Among all subsets Jo C J with comparable coordinates: 

\u(i)\ < 2\u(j)\ for all i,j G J , 

choose Jo with the maximal energy ||w|j ||2- 
Update: Add the set Jo to the index set: I <— / U Jo, and update the 
residual: 

y = argmin ||x — ^z^; r = x — $?/. 

zeK 1 



Remark. The identification and regularization steps of ROMP can be performed 
efficiently. In particular, the regularization step does not imply combinatorial com- 
plexity, but actually can be done in linear time. The running time of ROMP is thus 
comparable to that of OMP in theory, and is often better than OMP in practice. 
We discuss the runtime in detail in Section HI 

The main theorem of this paper states that ROMP yields exact sparse recovery 
provided that the measurement matrix satisfies the Restricted Isometry Condition. 

Theorem 1.3 (Exact sparse recovery via ROMP). Assume a measurement ma- 
trix $ satisfies the Restricted Isometry Condition with parameters (2n, e) for e = 
0.03/v^logn. Let v be an n-sparse vector in ~R d with measurements x = $u. Then 
ROMP in at most n iterations outputs a set I such that 

supp(f) C / and \I\ < 2n. 
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This theorem is proved in Section [3j 

Remarks. 1. Theorem II. 31 guarantees exact sparse recovery. Indeed, it is easy to 
compute the signal v from its measurements x — $>v and the set I given by ROMP 
as v — (&i)~ l x, where denotes the measurement matrix $ restricted to columns 
indexed by /. 

2. Theorem 11.31 gives uniform guarantees of sparse recovery. Indeed, once the 
measurement matrix satisfies a deterministic condition (RIC), then our algorithm 
ROMP correctly recovers every sparse vector from its measurements. Uniform guar- 
antees have been shown to be impossible for OMP [18J, and it has been an open 
problem to find a version of OMP with uniform guarantees (see |21j). Theorem 11.31 
says that ROMP essentially settles this problem. 

3. The logarithmic factor in e may be an artifact of the proof. At this moment, 
we do not know how to remove it. 

4. Measurement matrices known to satisfy the Restricted Isometry Condition 
include random Gaussian, Bernoulli and partial Fourier matrices, with number of 
measurements iV almost linear in the sparsity n, i.e. as in (11.11) . Section [2] contains 
detailed information. It has been unknown whether OMP gives sparse recovery for 
partial Fourier measurements (even with non-uniform guarantees). ROMP gives 
sparse recovery for these measurements, and even with uniform guarantees. 

The rest of the paper is organized as follows. In Section [2] we describe known 
classes of measurement matrices satisfying the Restricted Isometry Condition. In 
Section [3] we give the proof of Theorem II .31 In Section H] we discuss implementation, 
running time, and empirical performance of ROMP. 

Acknowledgment. We would like to thank the referees for a thorough reading of 
the manuscript and making useful suggestions which greatly improved the paper. 

2. Measurement matrices satisfying the Restricted Isometry 

Condition 

The only known measurement matrices known to satisfy the Restricted Isometry 
Condition with number of measurements as in (11.11) are certain classes of random ma- 
trices. The problem of deterministic constructions is still open. The known classes 
include: subgaussian random matrices (in particular, Gaussian and Bernoulli), and 
random partial bounded orthogonal matrices (in particular, partial Fourier matri- 
ces). 

Throughout the paper, C, c, C\, C2, c±, C2, ■ ■ ■ denote positive absolute constants 
unless otherwise specified. 

A subgaussian random matrix $ is a matrix whose entries are i.i.d. subgaussian 
random variables with variance 1. A random variable X is subgaussian if its tail 
distribution is dominated by that of the standard Gaussian random variable: there 
are constants C\,c\ > such that P(|A| > t) < C*iexp(— cit 2 ) for all t > 0. Ex- 
amples of subgaussian random variables are: standard Gaussian, Bernoulli (uniform 
±1), and any bounded random variables. 
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A partial bounded orthogonal matrix $ is formed by N randomly uniformly chosen 
rows of an orthogonal d x d matrix whose entries are bounded by C2/V0I, for 
some constant G%. An example of ^ is the discrete Fourier transform matrix. Taking 
measurements <&v with a partial Fourier matrix thus amounts to observing N random 
frequencies of the signal v. 

The following theorem documents known results on the Restricted Isometry Con- 
dition for these classes of random matrices. 

Theorem 2.1 (Measurement matrices satisfying RIC). Consider an N x d mea- 
surement matrix $, and let n > 1, e £ (0, 1/2), and 5 G (0, 1). 

1. If $ is a subgaussian matrix, then with probability 1 — 5 the matrix 

satisfies the Restricted Isometry Condition with parameters (n, e) provided that 

Cn , / d \ 

2. //$ is a partial bounded orthogonal matrix, then with probability 1 — 5 the matrix 
d $ satisfies the Restricted Isometry Condition with parameters (n, e) provided 



N 

that 



N>Ci^)lo g (^)^d. 



e 2 J & V e 2 



In both cases, the constant C depends only on the confidence level S and the constants 
Ci,Ci, C2 from the definition of the corresponding classes of matrices. 

Remarks. 1. The first part of this theorem is proved in [TTJ . The second part 
is from [19] ; a similar estimate with somewhat worse exponents in the logarithms 
was proved in [1] . See these results for the exact dependence of C on the confidence 
level 5 (although usually 5 would be chosen to be some small constant itself.) 

2. In Theorem ll.3[ we needed to use RIC for e = ci/y/logn. An immediate 
consequence of Theorem 12.11 is that subgaussian matrices satisfy such RIC for the 
number of measurements 

iV ~ n log 2 d 
and partial bounded orthogonal matrices for 

iV ~ n log 5 d. 

These numbers of measurements guarantee exact sparse recovery using ROMP. 



3. Proof of Theorem 11.31 

We shall prove a stronger version of Theorem II. 3\ which states that at every 
iteration of ROMP, at least 50% of the newly selected coordinates are from the 
support of the signal v. 

Theorem 3.1 (Iteration Invariant of ROMP). Assume $ satisfies the Restricted 
Isometry Condition with parameters (2n, e) for e = 0.03/ y/logn. Let v 7^ be an 
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n-sparse vector with measurements x = <&v. Then at any iteration of ROMP, after 
the regularization step, we have J ^ 0, J H / = and 

(3.1) |J nsupp(u)| > -|J |. 

In other words, at least 50% of the coordinates in the newly selected set Jq belong to 
the support of v. 

In particular, at every iteration ROMP finds at least one new coordinate in the 
support of the signal v. Coordinates outside the support can also be found, but 
(13. ip guarantees that the number of such "false" coordinates is always smaller than 
those in the support. This clearly implies Theorem 11.31 

Before proving Theorem 13.11 we explain how the Restricted Isometry Condition 
will be used in our argument. RIC is necessarily a local principle, which concerns 
not the measurement matrix $ as a whole, but its submatrices of n columns. All 
such submatrices $/, / C {1, . . . ,d}, \I\ < n are almost isometries. Therefore, for 
every n-sparse signal v, the observation vector u = approximates v locally, 

when restricted to a set of cardinality n. The following proposition formalizes these 
local properties of $ on which our argument is based. 

Proposition 3.2 (Consequences of Restricted Isometry Condition). Assume a mea- 
surement matrix $ satisfies the Restricted Isometry Condition with parameters (2n, e) . 
Then the following holds. 

(1) (Local approximation) For every n-sparse vector n6K li and every set I C 
{1, . . . , d}, \I\ < n, the observation vector u = satisfies 

— v \i\\2 < 2.03e||f H2. 

(2) (Spectral norm) For any vector z e M. N and every set I C {1, . . . , d}, \I\ < 
2n, we have 

||(#**)|/||2<(l+e)W a . 

(3) (Almost orthogonality of columns) Consider two disjoint sets I, J C {1, . . . , d}, 
I J U J\ < In. Let Pi,Pj denote the orthogonal projections in M. N onto 
range($/) and range ($j), respectively. Then 

||iW||2-2 < 2.2e. 

Proof. Part 1. Let T = / U supp(t> ), so that |T| < In. Let Idr denote the identity 
operator on R r . By the Restricted Isometry Condition, 

||^$r-Id r || 2 -, 2 = sup | \\^vyf 2 - \\y\\l\ < (1 + ef - 1 < 2.03e. 

yeR r , ||y|| 2 =i 

Since supp(w) C T, we have 

\\u\r — u|r||2 = ||$r$r^ — Idw || 2 < 2.03e: || -?7 1| 2 . 
The conclusion of Part 1 follows since IcT. 



Part 2. Denote by Qi the orthogonal projection in M. d onto M / . Since |/| < 2n, 
the Restricted Isometry Condition yields 

\\Qi$*h^2 = \\&Qih-*2 <l + e. 
This yields the inequality in Part 2. 

Part 3. The desired inequality is equivalent to: 

(x V/ 

- — — — < 2.2s for all x G range($/), y G range($ j). 

IMI2IMI2 

Let K = I U J so that \K\ < 2n. For any x G range($/), y G range($j), there are 
a, 6 so that 

x = $ K a, y = a Gf 1 , 6 G M J . 

By the Restricted Isometry Condition, 

W 2 >(l-e)||a|| 2> ||y|| 2 >(l-e)||6||2. 
By the proof of Part 2 above and since (ab) = 0, we have 

\(x,y)\ = \{($* K * K -Id r )o,6)| < 2.03e||a|| 2 ||6|| 2 . 

This yields 

|(z,y>| 2.03e 

< -, < 2.2e, 



which completes the proof. □ 
We are now ready to prove Theorem 13.11 

The proof is by induction on the iteration of ROMP. The induction claim is that 
for all previous iterations, the set of newly chosen indices Jo is nonempty, disjoint 
from the set of previously chosen indices J, and (13.11) holds. 

Let / be the set of previously chosen indices at the start of a given iteration. The 
induction claim easily implies that 

(3.2) |supp(v) U I\ < 2n. 

Let Jo, J> be the sets found by ROMP in the current iteration. By the definition of 
the set Jo, it is nonempty. 

Let r ^ be the residual at the start of this iteration. We shall approximate r by 
a vector in range($ supp ( 1 ,)\/). That is, we want to approximately realize the residual 
r as measurements of some signal which lives on the still unfound coordinates of the 
the support of v. To that end, we consider the subspace 

H := range($ supp („) U /) 

and its complementary subspaces 



F := range($/), E := range($ supp(t ,)\/). 
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The Restricted Isometry Condition in the form of Part 3 of Proposition 13.21 ensures 
that F and E are almost orthogonal. Thus E is close to the orthogonal complement 
of F in H, 

E:=F ± n H. 




We will also consider the signal we seek to identify at the current iteration, its 
measurements, and its observation vector: 

(3.3) v : = u| SU pp(i!)\/, := G E , u := $*x . 

Lemma [3.51 will show that ||(w — Mo)|t||2 for any small enough subset T is small, 
and Lemma 13.81 will show that ||w|j ||2 is not too small. First, we show that the 
residual r has a simple description: 

Lemma 3.3 (Residual). Here and thereafter, let Pl denote the orthogonal projection 
in R onto a linear subspace L. Then 

r = Pgx. 

Proof. By definition of the residual in the algorithm, r = P F ±x. Since x G H, we 
conclude from the orthogonal decomposition H = F + E that x = PpX + Pex. Thus 
r = x — Pfx = Pex. □ 

To guarantee a correct identification of Vq, we first state two approximation lem- 
mas that reflect in two different ways the fact that subspaces E and E are close to 
each other. This will allow us to carry over information from Eq to E. 

Lemma 3.4 (Approximation of the residual). We have 

\\%o - r\\ 2 < 2.2£||xo|| 2 - 

Proof. By definition of F, we have x — Xq = <&(v — vq) G F. Therefore, by Lemma I3~3l 
r = Pex = PeXq, and so 

Xq — r = xq — Pe%o — Pf%o — PfPe q Xq. 

Now we use Part 3 of Proposition 13.21 for the sets I and supp(t> ) \ I whose union 
has cardinality at most 2n by (13. 2p . It follows that ||-Pf-Peo x o||2 < 2.2e\\ 
desired. □ 
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Lemma 3.5 (Approximation of the observation). Consider the observation vectors 
Uq = $*x and u = $*r. Then for any set T C {1, . . . , d} with \T\ < 2n, we have 

||0o - «)|r||2 < 2.4e||wo|| 2 . 

Proof. Since Xq = $t>o, we have by Lemma E31 and the Restricted Isometry Condition 
that 

|bo-r|| 2 < 2.2£\\^vo\\ 2 < 2.2e(l + e)|K|| 2 < 2.3e|K|| 2 . 



To complete the proof, it remains to apply Part 2 of Proposition 13.21 which yields 
\\(u -u)\ T \\ 2 < (l + e )||a;o-r|| 2 . " ^ ' □ 

We next show that the energy (norm) of u when restricted to J, and furthermore 
to J Q , is not too small. By the approximation lemmas, this will yield that ROMP 
selects at least a fixed percentage of energy of the still unidentified part of the signal. 
By the regularization step of ROMP, since all selected coefficients have comparable 
magnitudes, we will conclude that not only a portion of energy but also of the 
support is selected correctly. This will be the desired conclusion. 

Lemma 3.6 (Localizing the energy). We have ||w|j||2 > 0.8||t>o|| 2 . 

Proof. Let S = supp(t> ) \ /. Since IS*! < n, the maximality property of J in the 
algorithm implies that 

IkoUlb > Iholslh- 
Furthermore, since vq\s = t>o, by Part 1 of Proposition 13.21 we have 

HuoMla > (l-2.03e)||uo|| 2 . 



Putting these two inequalities together and using Lemma 13.51 we conclude that 

IMj|| 2 > (1 - 2.03s) ||u || 2 - 2.4e||uo|| 2 > 0.8||u || 2 . 

This proves the lemma. □ 

We next bound the norm of u restricted to the smaller set Jo- We do this by first 
noticing a general property of regularization: 

Lemma 3.7 (Regularization). Let y be any vector in R m ; m > 1. Then there exists 
a subset Ac {1, . . . , m} with comparable coordinates: 

(3.4) |y(»)l<%0')l foralli,jeA, 
and with big energy: 

, , . . „ 1 
( 3 - 5 ) WvUh > n c n II 2/ II 2- 



2.5v/IoI 



m 



Proof. We will construct at most O(logm) subsets A/~ with comparable coordinates 
as in (13 .4p . and such that at least one of these sets will have large energy as in (13. 5p . 

Let y = (yi, . . . , y m ), and consider a partition of {1, ... , m} using sets with com- 
parable coordinates: 

A k := {z : 2- fc ||y|| 2 < \ Vi \ < 2- fc+1 ||y|| 2 }, k = 1, 2, . . . 
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Let ko = [logm] + 1, so that \yi\ < — \\yW2 for all i G A k , k > k Q . Then the set 
U = Ufc<fc contains most of the energy of y: 

11 11 ^ I I 1 II II \2\V 2 1 11 11 1 11 n 

\y\uA\2 < M — \\y 2) ) = -7= \\y 2 < — 7= 2/ 2- 



Thus ^ 

( Yl lbUJIi) 1/2 = II2/ J II 2 = (IMI2 - IMiHIl) 1 2 > TTjItoh- 

tl 
1 



fc<fco 

Therefore there exists k < fco such that 



v/2' 



IbUJb > -==||y|| 2 > _ 7, \\vh, 

V2fc 2.5vlogm 

which completes the proof. □ 

In our context, Lemma 1X71 applied to the vector u\j along with Lemma 131)1 directly 
implies: 

Lemma 3.8 (Regularizing the energy). We have 

0.32 .. M 

W Jo 2 > n M 2- 

Vlogn 

We now finish the proof of Theorem 13.11 

To show the first claim, that Jo is nonempty, we note that vq 7^ 0. Indeed, 
otherwise by (13.31) we have / C supp(f), so by the definition of the residual in the 
algorithm, we would have r = at the start of the current iteration, which is a 
contradiction. Then Jo 7^ by Lemma 13.81 

The second claim, that J n I = 0, is also simple. Indeed, recall that by the 
definition of the algorithm, r = P F ± e F x = (range($/)) _L . It follows that the 
observation vector u = $*r satisfies u\i = 0. Since by its definition the set J 
contains only nonzero coordinates of u we have J D / = 0. Since Jo C J, the second 
claim J fl / = follows. 

The nontrivial part of the theorem is its last claim, inequality (13. ip . Suppose it 
fails. Namely, suppose that | J fl supp(t>)| < \\Jq\, and thus 

|J \supp(w)| > ~|J |. 

Set A = J \supp(u). By the comparability property of the coordinates in J and 
since |A| > ||Jo|, there is a fraction of energy in A: 

(3-6) |Ma|| 2 > 4fII m UII2 > 7 n Ikolb, 

V 5 7 Vlog n 

where the last inequality holds by Lemma 13.81 

On the other hand, we can approximate u by Uq as 

(3.7) IMa||2 < ||w|a - W0UII2 + ll^oUlb- 

Since Ac J and using Lemma 13.51 we have 

IMa - W0UII2 < 2.4e||fo|| 2 
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Furthermore, by definition ( 13. 3ft of t>o, we have Vq\\ = 0. So, by Part 1 of Proposi- 
tion h 

IKUIh < 2.03£r||^ 1| 2- 
Using the last two inequalities in (13. 7} , we conclude that 

IkUlh < 4.43e||vo|| 2 - 

This is a contradiction to ( 13.61) so long as e < 0.03/Vlogn. This proves Theorem 13. II 

□ 

4. Implementation and empirical performance of ROMP 

4.1. Running time. The Identification step of ROMP, i.e. selection of the subset 
J, can be done by sorting the coordinates of u in the nonincreasing order and 
selecting n biggest. Many sorting algorithms such as Mergesort or Heapsort provide 
running times of 0(dlogd). 

The Regularization step of ROMP, i.e. selecting J C J, can be done fast by ob- 
serving that Jo is an interval in the decreasing rearrangement of coefficients. More- 
over, the analysis of the algorithm shows that instead of searching over all intervals 
Jo, it suffices to look for Jo among O(logn) consecutive intervals with endpoints 
where the magnitude of coefficients decreases by a factor of 2. (these are the sets 
Ak in the proof of Lemma [3.71) . Therefore, the Regularization step can be done in 
time 0(n). 

In addition to these costs, the k-th iteration step of ROMP involves multiplication 
of the d x N matrix $* by a vector, and solving the least squares problem with the 
N x \I\ matrix $/, where |/| < 2k < 2n. For unstructured matrices, these tasks 
can be done in time dN and 0(n 2 N) respectively. Since the submatrix of $ when 
restricted to the index set / is near an isometry, using an iterative method such 
as the Conjugate Gradient Method allows us to solve the least squares method in 
a constant number of iterations (up to a specific accuracy.) Using such a method 
then reduces the time of solving the least squares problem to just 0(nN). Thus in 
the cases where ROMP terminates after a fixed number of iterations, the total time 
to solve all required least squares problems would be just 0{nN). For structured 
matrices, such as partial Fourier, these times can be improved even more using fast 
multiply techniques. 

In other cases, however, ROMP may need more than a constant number of it- 
erations before terminating, say the full 0(n) iterations. In this case, it may be 
more efficient to maintain the QR factorization of and use the Modified Gram- 
Schmidt algorithm. With this method, solving all the least squares problems takes 
total time just 0(n 2 N). However, storing the QR factorization is quite costly, so 
in situations where storage is limited it may be best to use the iterative methods 
mentioned above. 

ROMP terminates in at most 2n iterations. Therefore, for unstructured matrices 
using the methods mentioned above and in the interesting regime N > \ogd, the 
total running time of ROMP is O(dNn). This is the same bound as for OMP [21] . 



14 



DEANNA NEEDELL AND ROMAN VERSHYNIN 



4.2. Non-sparse signals. In many applications, one needs to recover a signal v 
which is not sparse but close to being sparse in some way. Such are, for example, 
compressible signals, whose coefficients decay at a certain rate (see [7], [1]). To 
make ROMP work for such signals, one can replace the stopping criterion of exact 
recovery r = by "repeat n times or until r = 0, whichever occurs first" . Note that 
we could amend the algorithm for sparse signals in this way as well, allowing for a 
specific level of accuracy to be attained before terminating. 

We recently proved that ROMP is stable and guarantees approximate recovery of 
non-sparse signals with noisy measurements; this will be discussed in a forthcoming 
paper. 

4.3. Experiments. This section describes our experiments that illustrate the signal 
recovery power of ROMP. We experimentally examine how many measurements N 
are necessary to recover various kinds of n-sparse signals in M. d using ROMP. We 
also demonstrate that the number of iterations ROMP needs to recover a sparse 
signal is in practice at most linear the sparsity. 

First we describe the setup of our experiments. For many values of the ambient 
dimension d, the number of measurements N, and the sparsity n, we reconstruct 
random signals using ROMP. For each set of values, we generate aniVxd Gaussian 
measurement matrix $ and then perform 500 independent trials. The results we 
obtained using Bernoulli measurement matrices were very similar. In a given trial, 
we generate an n-sparse signal v in one of two ways. In either case, we first select the 
support of the signal by choosing n components uniformly at random (independent 
from the measurement matrix $). In the cases where we wish to generate flat signals, 
we then set these components to one0 In the cases where we wish to generate sparse 
compressible signals, we set the i th component of the support to plus or minus i~ l / p 
for a specified value of < p < 1. We then execute ROMP with the measurement 
vector x = 

Figure [1] depicts the percentage (from the 500 trials) of sparse flat signals that 
were reconstructed exactly. This plot was generated with d = 256 for various levels 
of sparsity n. The horizontal axis represents the number of measurements N, and 
the vertical axis represents the exact recovery percentage. We also performed this 
same test for sparse compressible signals and found the results very similar to those 
in Figure HJ Our results show that performance of ROMP is very similar to that of 
OMP which can be found in |21j . 

Figure [2] depicts a plot of the values for N and n at which 99% of sparse flat 
signals are recovered exactly. This plot was generated with d = 256. The horizontal 
axis represents the number of measurements N, and the vertical axis the sparsity 
level n. 

Theorem 11.31 guarantees that ROMP runs with at most 0(n) iterations. Figure [3] 
depicts the number of iterations executed by ROMP for d = 10, 000 and N = 200. 



Our work as well as the analysis of Gilbert and Tropp [3T] show that this is a challenging case 
for ROMP (and OMP). 
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ROMP was executed under the same setting as described above for sparse flat sig- 
nals as well as sparse compressible signals for various values of p, and the number 
of iterations in each scenario was averaged over the 500 trials. These averages were 
plotted against the sparsity of the signal. As the plot illustrates, only 2 iterations 
were needed for flat signals even for sparsity n as high as 40. The plot also demon- 
strates that the number of iterations needed for sparse compressible is higher than 
the number needed for sparse flat signals, as one would expect. The plot suggests 
that for smaller values of p (meaning signals that decay more rapidly) ROMP needs 
more iterations. However it shows that even in the case of p = 0.5, only 6 iterations 
are needed even for sparsity n as high as 20. 



Percentage of input signals recovered exactly (d=256) (Gaussian) 




Number of Measurements (N) 

Figure 1. The percentage of sparse flat signals exactly recovered by 
ROMP as a function of the number of measurements N in dimension 
d = 256 for various levels of sparsity n. 
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99% Recovery Trend, d=256 

45 I 1 1 1 




Measurements N 



Figure 2. The 99% recovery limit as a function of the sparsity n and 
the number of measurements N for sparse flat signals. 



Number of Iterations needed to reconstruct (d=1 0,000, N=200) (Gaussian) 




10 

Sparsity Level (n) 



Figure 3. The number of iterations executed by ROMP as a function 
of the sparsity n in dimension d = 10, 000 with N = 200. 
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